Skip to content

Comments

Set topic parameters#406

Merged
mergify[bot] merged 18 commits intosigp:unstablefrom
diegomrsantos:set-topic-parameters
Jul 9, 2025
Merged

Set topic parameters#406
mergify[bot] merged 18 commits intosigp:unstablefrom
diegomrsantos:set-topic-parameters

Conversation

@diegomrsantos
Copy link
Member

@diegomrsantos diegomrsantos commented Jul 3, 2025

Issue Addressed

#371

Proposed Changes

This PR replaces the old subnet_tracker module with a new subnet_service that tracks subnets and calculates dynamic message rates on epoch boundaries. It updates downstream consumers (network, client, scoring) to use the new API and pre-calculated rates.

  • Introduces subnet_service crate with dynamic message-rate computation and epoch scheduling
  • Updates topic-scoring API to accept pre-calculated rates (new_with_rate, topic_score_params_for_subnet_with_rate)
  • Refactors all crates (network, client, message_sender, fuzz, peer_manager, handshake, discovery) to depend on subnet_service instead of subnet_tracker
  • creates disable_gossipsub_topic_scoring and sets it to falseby default

Additional Info

Please provide any additional information. For example, future considerations
or information useful for reviewers.

@diegomrsantos diegomrsantos self-assigned this Jul 3, 2025
@dknopik
Copy link
Member

dknopik commented Jul 4, 2025

When adding the subnet logic, we decided to separate it into the subnet_service in order to avoid using the network state in the network code. This has had the advantage of avoiding blocking on the lock in the network loop and separating the application logic from the network layer.

What do you think of moving some of the logic here into the subnet_service? Either the full calculation, introducing a UpdateMessageRate(Topic, Rate) event, or just TopicUpdated(Topic, Vec<CommitteeInfo>) which would trigger the actual calculation in the network crate.

@diegomrsantos diegomrsantos force-pushed the set-topic-parameters branch from 3b2f0d6 to 7516535 Compare July 4, 2025 10:19
@diegomrsantos diegomrsantos requested a review from Copilot July 4, 2025 10:20

This comment was marked as outdated.

@diegomrsantos
Copy link
Member Author

When adding the subnet logic, we decided to separate it into the subnet_service in order to avoid using the network state in the network code. This has had the advantage of avoiding blocking on the lock in the network loop and separating the application logic from the network layer.

What do you think of moving some of the logic here into the subnet_service? Either the full calculation, introducing a UpdateMessageRate(Topic, Rate) event, or just TopicUpdated(Topic, Vec<CommitteeInfo>) which would trigger the actual calculation in the network crate.

True, the current implementation needs to be changed. How about?

pub enum SubnetEvent {
    Join(SubnetId, Vec<CommitteeInfo>),
    Leave(SubnetId),
}

@dknopik
Copy link
Member

dknopik commented Jul 4, 2025

@diegomrsantos

This would require us to send Join again to pass the committee info if it changes. While this may simplify some handling, an explicit Update(SubnetId, Vec<CommitteeInfo>) might be a bit clearer. I am on the fence about which is better, so feel free to try out what fits best.

@diegomrsantos
Copy link
Member Author

diegomrsantos commented Jul 4, 2025

@diegomrsantos

This would require us to send Join again to pass the committee info if it changes. While this may simplify some handling, an explicit Update(SubnetId, Vec<CommitteeInfo>) might be a bit clearer. I am on the fence about which is better, so feel free to try out what fits best.

How about both events with the vec? But the second one maybe UpdateScore

@dknopik
Copy link
Member

dknopik commented Jul 4, 2025

I am also unsure about where we should place the actual calculation of the rate - it kind of fits both crates IMO, depending how you look at it. It feels like an application specific calculation (just like the decision what subnets to subscribe to), and maybe should therefore also live in the subnet_service, with just the message rate being passed to network. But I am kind of torn on that as well, so I leave it up to you

@diegomrsantos
Copy link
Member Author

I am also unsure about where we should place the actual calculation of the rate - it kind of fits both crates IMO, depending how you look at it. It feels like an application specific calculation (just like the decision what subnets to subscribe to), and maybe should therefore also live in the subnet_service, with just the message rate being passed to network. But I am kind of torn on that as well, so I leave it up to you

Good observation. I think I'll move message_rate to the subnet_service. Thanks!

@diegomrsantos diegomrsantos marked this pull request as ready for review July 4, 2025 13:33
@diegomrsantos diegomrsantos requested review from Copilot and dknopik July 4, 2025 13:33

This comment was marked as outdated.

@diegomrsantos diegomrsantos force-pushed the set-topic-parameters branch from 6135908 to 222eaea Compare July 4, 2025 14:36
@diegomrsantos diegomrsantos requested a review from Copilot July 4, 2025 14:41

This comment was marked as outdated.

@dknopik
Copy link
Member

dknopik commented Jul 4, 2025

The current state does not update the params on updated committee data, right? Also, based on the the discussion above, we have to adjust the events passed to network

@diegomrsantos diegomrsantos force-pushed the set-topic-parameters branch from 222eaea to 21e561b Compare July 4, 2025 16:32
Copy link
Member

@dknopik dknopik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Comment on lines 80 to 84
// For the "all subnets" case, we don't have specific committee info, so pass an empty
// vec
if let Err(err) = tx.try_send(SubnetEvent::Join(subnet, Vec::new())) {
error!(?err, "Impossible error while subscribing to all subnets");
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can still provide information on the committee in this case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 100 to 103
// `previous_subnets` tracks which subnets were joined in the last iteration.
let mut previous_subnets = HashSet::new();
// Track committee info hash for each subnet to detect changes efficiently
let mut previous_committee_hashes: HashMap<SubnetId, u64> = HashMap::new();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these can be unified by hashing subnets, not committees.

But regardless, can we not just calculate the score and compare if the old score is different from the new score?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you guys probably know better than me, but I think this approach maintains cleaner boundaries: subnet service handles membership, network handles scoring. Wouldn't hashing the subnet mean we cant distinguish joining and committee changes? I dont mind either way just my 0.02.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, my comment above is incorrect, we are hashing subnets here. Still, the set and the map can be unified: After an iteration, a subnet is always either in both previous_subnets and
previous_committee_hashes, or neither. So previous_subnets does not provide value anymore, unless I am missing something.

I think this approach maintains cleaner boundaries

IMO it does not. If we have the network crate calculate the message rate, it needs access to the network state as provided by the database, which we avoided until now. This means one more thing passed into it, one more place where we may wait on a lock, and more domain specific logic (the message rate calculation algorithm) inside it. Note that by inside it I mean the code flow, not the location of the file where the message rate is being calculated.

Basically, the way I think of it is: the network crate handles scoring and subnet membership based on the parameters it is passed. It does not care about how we decide about which subnets to describe on, or how we calculate the subnet message rate, as it does not have any domain logic. We do the same approach for messages: it immediately passes them to other components to avoid any domain logic computations in the code flow of the network loop.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you unified the hashmap and set could you distinguish for committeeupdate events for committee information changes not just when we join/leave subnet? If we can all good. Thanks for the clarification and dont disagree - the separation as it is here is what I was referring to as a preference and dont want to add state dependencies. Apologies if I oversimplified.

Comment on lines 232 to 251
/// Compute a lightweight hash of committee information to detect changes efficiently
fn compute_committee_hash(committees: &[CommitteeInfo]) -> u64 {
let mut hasher = std::collections::hash_map::DefaultHasher::new();

// Hash the number of committees first
committees.len().hash(&mut hasher);

// Hash each committee's essential data
for committee in committees {
// Hash committee members by converting to a sorted vector
let mut members: Vec<_> = committee.committee_members.iter().collect();
members.sort_unstable();
members.hash(&mut hasher);

// Hash validator indices
committee.validator_indices.hash(&mut hasher);
}

hasher.finish()
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be interesting to check if this is considerably faster than just calculating the score.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recalculating is faster than computing the hashes, so I'll recalculate everything at every epoch

@dknopik dknopik requested a review from jking-aus July 4, 2025 21:00
@dknopik dknopik added the v0.2.0 Second testnet release label Jul 4, 2025
jking-aus
jking-aus previously approved these changes Jul 7, 2025
Copy link
Member

@jking-aus jking-aus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm - just address daniel's comments

@diegomrsantos diegomrsantos requested a review from jking-aus July 8, 2025 09:09
Copy link
Member

@dknopik dknopik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! Only a few nitpicks left

Comment on lines 82 to 97
let current_state = db.borrow();
for subnet in (0..(subnet_count as u64)).map(SubnetId) {
let committees_info = get_committee_info_for_subnet(&subnet, &*current_state);
let message_rate = message_rate::calculate_message_rate_for_topic::<E>(
&committees_info,
&chain_spec,
);

if let Err(err) = tx.try_send(SubnetEvent::Join(subnet, message_rate)) {
error!(
?err,
subnet = *subnet,
"Failed to send subnet join event during initialization"
);
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, this only runs on startup - so we never update the rate when in --subscribe-all-subnets mode

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@diegomrsantos diegomrsantos requested a review from Copilot July 8, 2025 17:14
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Replaces the legacy subnet_tracker with a new subnet_service crate that tracks subnet membership and computes dynamic gossipsub message rates at epoch boundaries, and updates all downstream consumers to use the new API and pre-calculated rates.

  • Introduces subnet_service with dynamic message-rate computation and epoch scheduling
  • Updates network, client, scoring, and message-sender modules to use subnet_service and pre-calculated rates
  • Removes the old subnet_tracker crate and adjusts workspace dependencies

Reviewed Changes

Copilot reviewed 23 out of 25 changed files in this pull request and generated 1 comment.

File Description
anchor/subnet_tracker/src/lib.rs Deleted old subnet_tracker implementation
anchor/subnet_service/src/lib.rs Added new subnet_service with epoch-based rate updates
anchor/network/src/network.rs Adapted Network to subscribe with optional rates and handle RateUpdate
anchor/client/src/lib.rs Switched from start_subnet_tracker to start_subnet_service
Comments suppressed due to low confidence (1)

anchor/subnet_service/src/lib.rs:256

  • [nitpick] The handle_epoch_committee_update logic emits RateUpdate events every epoch—this is critical dynamic behavior. Consider adding unit or integration tests for this function to verify that message rates are recalculated and sent correctly when an epoch boundary is reached.
async fn handle_epoch_committee_update<E: EthSpec>(

Comment on lines +302 to +307
) -> Vec<CommitteeInfo> {
network_state
.clusters()
.values()
.filter(|cluster| {
let cluster_subnet = SubnetId::from_committee(cluster.committee_id(), SUBNET_COUNT);
Copy link

Copilot AI Jul 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In get_committee_info_for_subnet, using the constant SUBNET_COUNT can lead to inconsistencies if the service is initialized with a different subnet_count. Consider passing the runtime subnet_count into this helper instead of the fixed constant to ensure correct committee-to-subnet mapping.

Suggested change
) -> Vec<CommitteeInfo> {
network_state
.clusters()
.values()
.filter(|cluster| {
let cluster_subnet = SubnetId::from_committee(cluster.committee_id(), SUBNET_COUNT);
subnet_count: usize,
) -> Vec<CommitteeInfo> {
network_state
.clusters()
.values()
.filter(|cluster| {
let cluster_subnet = SubnetId::from_committee(cluster.committee_id(), subnet_count);

Copilot uses AI. Check for mistakes.
@diegomrsantos diegomrsantos requested a review from dknopik July 8, 2025 17:52
@dknopik
Copy link
Member

dknopik commented Jul 8, 2025

Neat! Thanks!

One more round of testing over night, fingers crossed!

@diegomrsantos
Copy link
Member Author

Neat! Thanks!

One more round of testing over night, fingers crossed!

The topic params are disabled in the current code

Comment on lines +155 to +168
tokio::select! {
// Handle database changes for subnet join/leave (only if not subscribe_all_subnets)
_ = db.changed(), if !subscribe_all_subnets => {
handle_subnet_changes::<E>(&tx, &mut db, &mut previous_subnets, subnet_count, &chain_spec, disable_gossipsub_topic_scoring).await;
}

// Handle scheduled epoch boundaries (for both modes, but only if scoring is enabled)
_ = sleep(next_epoch_delay), if !disable_gossipsub_topic_scoring => {
handle_epoch_committee_update::<E>(&tx, &mut db, &previous_subnets, &chain_spec).await;
// Recalculate the next epoch delay only after we've processed the epoch boundary
next_epoch_delay = calculate_duration_to_next_epoch::<E>(&slot_clock);
}
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

select panics if all branches are disabled via if, unless an else branch is provided

I suggest adding an else branch with return, as the network crate should handle this gracefully.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Comment on lines +60 to +61
/// Disables gossipsub topic scoring and message rate calculations.
pub disable_gossipsub_topic_scoring: bool,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to add a CLI flag to re-enable this, maybe with a note that it is experimental?

# Conflicts:
#	Cargo.lock
#	anchor/fuzz/Cargo.toml
#	anchor/fuzz/fuzz_targets/setup.rs
@diegomrsantos diegomrsantos force-pushed the set-topic-parameters branch from 92a4596 to 5bd856a Compare July 9, 2025 13:03
@dknopik dknopik force-pushed the set-topic-parameters branch from 48c8ca4 to 5bd856a Compare July 9, 2025 13:19
@diegomrsantos diegomrsantos force-pushed the set-topic-parameters branch from 5bd856a to d193578 Compare July 9, 2025 13:31
Copy link
Member

@dknopik dknopik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you for tackling this!

@mergify mergify bot merged commit 4df75e0 into sigp:unstable Jul 9, 2025
14 checks passed
@diegomrsantos
Copy link
Member Author

LGTM, thank you for tackling this!

Thank you for the great review

mergify bot pushed a commit that referenced this pull request Jul 10, 2025
This fixes the issues arising with #406. The root cause is as follows:

We send messages anonymously, setting a random author ID through `MessageAuthenticity::RandomAuthor`. Libp2p by default penalizes getting sent messages that originate from ourselves. However, as we do not track under which author we published each message, libp2p simply punishes messages that we have published, identified by message ID. The message ID depends only on the message contents.

The famed "decided message" is generated after a QBFT instance completes - and as it simply is an aggregation of existing signatures and messages, it can be generated by all participants and actually ends up having identical content in many cases. Nodes generate this message at the same time and send it almost simultaneously.

This causes nodes to be penalized for "self" messages, which are not actually self messages.


  Disable penalizing getting sent messages sent by "self" - which may actually be generated by someone else but happens to be identical.
diegomrsantos added a commit to diegomrsantos/anchor that referenced this pull request Sep 17, 2025
sigp#371


  This PR replaces the old `subnet_tracker` module with a new `subnet_service` that tracks subnets and calculates dynamic message rates on epoch boundaries. It updates downstream consumers (network, client, scoring) to use the new API and pre-calculated rates.

- Introduces `subnet_service` crate with dynamic message-rate computation and epoch scheduling
- Updates topic-scoring API to accept pre-calculated rates (`new_with_rate`, `topic_score_params_for_subnet_with_rate`)
- Refactors all crates (network, client, message_sender, fuzz, peer_manager, handshake, discovery) to depend on `subnet_service` instead of `subnet_tracker`
- creates `disable_gossipsub_topic_scoring` and sets it to `false`by default
diegomrsantos pushed a commit to diegomrsantos/anchor that referenced this pull request Sep 17, 2025
This fixes the issues arising with sigp#406. The root cause is as follows:

We send messages anonymously, setting a random author ID through `MessageAuthenticity::RandomAuthor`. Libp2p by default penalizes getting sent messages that originate from ourselves. However, as we do not track under which author we published each message, libp2p simply punishes messages that we have published, identified by message ID. The message ID depends only on the message contents.

The famed "decided message" is generated after a QBFT instance completes - and as it simply is an aggregation of existing signatures and messages, it can be generated by all participants and actually ends up having identical content in many cases. Nodes generate this message at the same time and send it almost simultaneously.

This causes nodes to be penalized for "self" messages, which are not actually self messages.


  Disable penalizing getting sent messages sent by "self" - which may actually be generated by someone else but happens to be identical.
jking-aus pushed a commit to jking-aus/anchor that referenced this pull request Oct 8, 2025
sigp#371


  This PR replaces the old `subnet_tracker` module with a new `subnet_service` that tracks subnets and calculates dynamic message rates on epoch boundaries. It updates downstream consumers (network, client, scoring) to use the new API and pre-calculated rates.

- Introduces `subnet_service` crate with dynamic message-rate computation and epoch scheduling
- Updates topic-scoring API to accept pre-calculated rates (`new_with_rate`, `topic_score_params_for_subnet_with_rate`)
- Refactors all crates (network, client, message_sender, fuzz, peer_manager, handshake, discovery) to depend on `subnet_service` instead of `subnet_tracker`
- creates `disable_gossipsub_topic_scoring` and sets it to `false`by default
jking-aus pushed a commit to jking-aus/anchor that referenced this pull request Oct 8, 2025
This fixes the issues arising with sigp#406. The root cause is as follows:

We send messages anonymously, setting a random author ID through `MessageAuthenticity::RandomAuthor`. Libp2p by default penalizes getting sent messages that originate from ourselves. However, as we do not track under which author we published each message, libp2p simply punishes messages that we have published, identified by message ID. The message ID depends only on the message contents.

The famed "decided message" is generated after a QBFT instance completes - and as it simply is an aggregation of existing signatures and messages, it can be generated by all participants and actually ends up having identical content in many cases. Nodes generate this message at the same time and send it almost simultaneously.

This causes nodes to be penalized for "self" messages, which are not actually self messages.


  Disable penalizing getting sent messages sent by "self" - which may actually be generated by someone else but happens to be identical.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants